Fall prediction in a quiet standing balance test via machine learning: Is it possible?

The elderly population is growing rapidly in the world and falls are becoming a big problem for society. Currently, clinical assessments of gait and posture include functional evaluations, objective, and subjective scales. They are considered the gold standard to indicate optimal mobility and performance individually, but their sensitivity and specificity are not good enough to predict who is at higher risk of falling. An innovative approach for fall prediction is the machine learning. Machine learning is a computer-science area that uses statistics and optimization methods in a large amount of data to make outcome predictions. Thus, to assess the performance of machine learning algorithms in classify participants by age, number of falls and falls frequency based on features extracted from a public database of stabilometric assessments. 163 participants (116 women and 47 men) between 18 and 85 years old, 44.0 to 75.9 kg mass, 140.0 to 189.8 cm tall, and 17.2 to 31.9 kg/m2 body mass index. Six different machine learning algorithms were tested for this classification, which included Logistic Regression, Linear Discriminant Analysis, K Nearest-neighbours, Decision Tree Classifier, Gaussian Naive Bayes and C-Support Vector Classification. The machine learning algorithms were applied in this database which has sociocultural, demographic, and health status information about participants. All algorithm models were able to classify the participants into young or old, but our main goal was not achieved, no model identified participants at high risk of falling. Our conclusion corroborates other works in the biomechanics field, arguing the static posturography, probably due to the low daily living activities specificity, does not have the desired effects in predicting the risk of falling. Further studies should focus on dynamic posturography to assess the risk of falls.


Introduction
For the elderly, the world's average prevalence of falls is 26.5% (95% CI 23.4-29.8%)[1].As the elderly grows rapidly in the world [2], falling are becoming a huge social problem.Currently, the gait and posture clinical assessments include functional evaluations, objective and subjective scales.They are considered the gold standard for an optimal mobility and performance individually [3], but their sensitivity and specificity are not good to predict fall [4].Although they are easily applied in clinical practice and considered the gold standard, they are not reliable for the daily living activities [4].Novel approaches for fall prediction are emerging to account for real-time fall prediction and future fall prediction, which are necessary to improve health care systems, and both are based on the proper fall risk assessment [5,6].The prognosis and early selection of an intervention to reduce or to prepare the individuals for falls demands an efficient predictive tool to identify older adults at higher risk of falling [5,7].
A novel approach for fall prediction is the machine learning (ML).ML is a computer-science field that uses statistics and optimization methods for a large amount of data to predict outcomes [8][9][10].When fed with new data, ML focuses on classification, which involves choosing among subgroups to best describe a new data instance, and prediction, which involves estimating an unknown parameter.ML is more efficient in predictions than the traditional statistical analysis [11][12][13].In medicine, health and wellness fields, ML has improved clinical-decision making, assisting in diagnosing Parkinson's disease [14], breast masses [15] or even aiding in the classification of health-related quality of life for older adults with chronic disease [16].
ML can classify gait patterns.ML has been applied for classifying physical activities or movement patterns.It is efficient in separating young' and older adults' gait patterns [17], older adults who do or do not fall recurrently [18], detecting falls with different sensors and sensor locations [6,[19][20][21][22][23], predicting fall-related severe outcome and injury [24], and predicting the fallers mortality [25].About stratifying the elderly at risk of falling, ML has been applied to medical records and organizational factors in nursing homes [26], patient demographics, historical visits, visit patterns and diagnoses data after emergency department visits [27], and gait spatial-temporal parameters [28][29][30].ML has been also applied in static balance tests, but its efficacy in distinguishing individuals at higher fall risk is controversial and its value in predicting falls remains unclear [31][32][33].
Thus, the aim of this study was to analyze the stabilometric assessment of adult balance using ML to classify people with different annual frequencies of accidental falls in a public database related to human balance [34].Six different ML algorithms were tested for this classification, which included Logistic Regression (LR), Linear Discriminant Analysis (LDA), K Nearest-neighbours (KNN), Decision Tree Classifier (CART), Gaussian Naive Bayes (NB) and C-Support Vector Classification (SVC).A comparison among their performances is also presented.To the best of our knowledge, this is the first study to apply ML in a static balance test into a greater age spectrum, which can mark clear differences among participants.Our hypothesis was that supervised ML would detect patterns in a static balance test that characterize non-fallers (who had never fallen one year before data collection) and fallers (who had fallen at least once up to one year before data collection).

Method
This study used a public data repository, all participants (163, 116 women and 47 men), between 18 and 85 years old, were informed about the study' purpose and procedures, and gave the free informed consent, which terms were approved by the UFABC ethics committee #842529/2014 [34].The ML algorithm was applied in this database which has sociocultural, demographic and health status information about participants.

Participants
The participants' main characteristics were: 44.0 to 75.9 kg mass, 140.0 to 189.8 cm tall, and 17.2 to 31.9 kg/m 2 body mass index.Ten percent of adults had one or more severe disabilities, including hearing, vestibular, visual, intelligence, and musculoskeletal deficits.For more information, refer to [34].

Balance evaluation
Four experimental conditions were analyzed: standing on a hard surface with closed or open eyes, and standing on a foam surface with closed or open eyes open.All conditions were recorded for 60 s on a force platform (OPT400600-1000 model, AMTI, Watertown, MA, USA), feet with an angle of 20 degrees and 10 cm apart from heels.Ground reaction forces and moments of forces were recorded (100 Hz sampling frequency, Optima Signal Conditioner, AMTI, Watertown, MA, USA), low-pass filtered (10 Hz 4th order Butterworth filter) and used to calculate the center of pressure (COP).There were three trials in each condition for every individual.

Analyzed parameters
Features from AP (x-positive is anterior) and medio-lateral ML (y-positive is to the right) COP were calculated) of all three subjects' attempts in each type of test configuration (standing on a hard surface with eyes open, standing on a hard surface with eyes closed, standing on a foam surface with eyes open and standing on a foam surface with eyes closed):area, mean velocity, maximum excursion, and RMS value.Each question from Mini Balance Evaluation Systems Tests (Mini-BESTest), Short Falls Efficacy Scale International (Short FES-I), International Physical Activity Questionnaire-Short Version (IPAQ-SV) and Trail Making Test (TMT) was analyzed separately.Thus, IA models features were all 53 biomechanical parameters obtained from COP tests and demographic data: age, sex, height, weight and BMI.
In this work, the following six supervised learning models are used: Logistic Regression (LR).This method consists of applying statistical models to separate binary results, for example, if an email is SPAM or not.A relationship between the expected result and the independent variables is established by estimating probabilities through a cumulative logistic function.In this model, the probabilities that describe the possible outcomes of a single trial are modeled using a logistic function [35], such as the equation: where x0 is the value of x at the midpoint of the sigmoid curve; L is the maximum value of the curve and k is the slope of the curve.
Linear Discriminant Analysis (LDA).Linear Discriminant Analysis is a classifier that employs a linear decision surface.Among its advantages, LDA has closed format solutions that can be easily computed.Also, LDA is inherently multiclass and does not need to have adjustable hyperparameters [36].

K-nearest neighbors (KNN)
. This is a non-parametric algorithm, that is, there is no assumption about the distribution of the input data.It is based on searching for similarities: the data points are classified using the distance between them, looking for their nearest k neighbors.Then, these data will be sorted to the most common class among these k-neighbors [37].
Decision Tree (CART).Decision Trees are a non-parametric supervised learning method employed in classification and regression.The objective is to create a model that predicts the value of a variable by learning simple decision rules inferred from data parameters.A tree can be seen as a constant approximation of functions defined in chunks.The deeper the tree, the more complex the decision rules will be [38].

Gaussian Naive Bayes (NB).
Naive Bayes methods are a set of algorithms based on the application of Bayes' theorem with the naive assumption of conditional independence between each pair of parameters [39].The probability of resources is assumed to be Gaussian, as shown in the equation; y and y are estimated using the maximum likelihood [39]: C-Support Vector Classification (SVM).Support vector machines (SVM) are a set of supervised learning methods used for classification, regression and also outlier detection.From a vector of input data, with N dimensions, (N-1) interface hyperplanes are generated.In general, these algorithms construct a decision line between the results in each of these hyperplanes and these interfaces are used, further on, to make predictions [40].
The performance of a ML program can be evaluated based on performance parameters linked to the classification capacity of the model, such as sensitivity, precision and accuracy.: Precision.Refers to the repeatability of the results [40].It is defined by dividing the number of true positives (TP) by the total number of positives found, TP+FP (FP stands for false positives): Sensitivity or recall.Refers to the model's ability to avoid false negatives [40].It is defined by dividing the number of true positives by the total number of elements that belong to the positive group, TP+FN: Accuracy.Refers to how close a variable measure is to its actual value [40].It is defined as the proportion of true results (true positives TP and true negatives TN) by total results: Ideally, precision, sensitivity and accuracy should be equal to 1 (or 100%); however, values very close to 1 may indicate overfitting of the training data.
Santos & Duarte public database was analyzed with the before-mentioned supervised ML algorithms.The models were trained with all 53 biomechanical parameters obtained from the force balance.All data were normalized for the analysis methods used using z-scores with respect to their mean values.When missing data occurred, they were replaced with the mean of that specific attribute.For supervised algorithms, the analysis was carried out using data from week zero, while data from subsequent weeks were used for validation of this training to observe if there is a change in individual classification or any temporal trends.A code authored by Jason Brownlee [41] was adapted to analyze the performance of the different algorithms.
The training performance of different algorithms was conducted by randomly dividing the data into two groups: 80% of the data were used for training, and the remaining 20% for validation tests.This was repeated at least 10 times in order to avoid accidental bias due to the splitting.
In the case of supervised learning, an a priori labeling of the training data must be used.To explore different prediction possibilities, three types of classification, as suggested by Santos and Duarte, 2016; were performed, each in a separate program: • Age: Young: Age < 60 (88 young subjects); Old: Age � 60 (75 elderly subjects) • Falls: how many non-intentional falls the subject had in the last 12 months, as declared by themselves-No falls (zero falls-120 subjects); One fall or more (from 0 to. ..-43 subjects).
• Fall frequency: how many non-intentional falls the subject had in the last 12 months, as declared by themselves-Non-faller (from 0 to 1-128 subjects); Faller (from 1 to. ..-35 subjects) All code was developed in Python 3.7 language.The ML algorithms belong to the sklearn library and were implemented via the these functions: Logistic Regression, Linear Discriminant Analysis, Decision Tree Classifier, Gaussian NB, SVC, K-Neighbors Classifier.

Results
In our study, precision is the ratio of correctly predicted falls to the total number of correctly and incorrectly predicted falls; sensitivity is the ratio of correctly predicted falls of total reported falls; and accuracy is the ratio of falls and non-falls correctly predicted by the total number of observations of falls and non-falls.The results in Tables 1 to 3 (Values within parenthesis represent the 1 standard deviation uncertainty of the calculation.)shows that none of the ML algorithms were able to discriminate non-fallers from fallers based on the low sensitivity of the different algorithms.(even when anyone presented a value close to 1, the other group presented a lower value).

Discussion
This study applied six different supervisioned ML algorithms to classify 163 adults into different types of sorting, but mostly to find out if non-fallers (who had never fallen one year before data collection) and fallers (who had fallen at least once up to one year before data collection) could be sorted, with the aim of creating a tool with the ability to predict adults who are at greater risk of falling from an easily applicable test.The ML algorithms used 53 features to build these models which were not capable of classifying the participants at higher risk of falling in our sample.The ML methods had a good performance only when classifying the participants into young and old adults based on their static balance test.
Analyzing the models performance (Tables 1-3), all algorithms had good to satisfactory to good results for the three types of classification, since all accuracies were above a 60% threshold, with a maximum mean accuracy of 0.91(51) for the logistic regression regarding the separation between faller and non-faller individuals.However, low values of precision and sensitivity were observed for the fall classifications; in some cases, these values were null (LR, LDA and SVM algorithms).This means no data point was correctly labeled during the validation tests.This behavior is related to the nature of the dataset used, where the event classes (fallers and non-fallers) are unbalanced.Such classification is precisely the objective of this work.Therefore, ML methods performed well only when used to classify a patient into young and old, as can be seen in Fig 1a, based on their static balance test, failing classifications related to falls.
Although this result is disappointing from a health care and technological perspective, as in developing of an easy manner for detection of people at high risk of falling, on the other hand our results indicate a warning sign to the widespread use of standardized balance measures in the adult population to avoid falls.
The postural control research field presents a large number of independently standardized tests and measures to assess balance in adults (i.e:Activity-based Balance Level Evaluation scale, BESTest, Berg Balance Scale, Community Balance and Mobility scale), in addition to not encompassing some components that contribute considerably to the balance [42]; this variety of tests impairs the comparison of results among them and does not allow a better understanding of the relationship between performance in balance tests and the risk of falls.
There is some evidence that mediolateral sway with eyes open [33,43] average speed of COP [44] and area of COP movement [45] have positive correlation with numbers of falls.Nonetheless, other studies, like ours (illustrated in the Fig 1b and 1c), did not find a relation  between static balance test performance and risk of falls.A hypothesis raised by Buatois 2006 was that static posturographic tests were not sensitive enough for active and independent participants since the studies that found this relationship had a significantly older population or were community-dwelling elderly [32].In their systematic review, Piirtola et al., 2006 did not find clear differences in the participants in studies whose positive association between sway parameters and falls was shown compared with studies in which such associations were not found [46].Recently, Cabral et al., 2020 applied different ML methods in a large dataset of communitydwelling elderly fallers aged 60 to 88 years old, an age group that should increase the test sensitivity.This study found that static posturography does not improve the prediction of recurrent and single fallers [31].We hypothesized the comparison with young people could mark greater differences between non-fallers and fallers, but the algorithms differentiated only young and old people.The principal component analysis showed the main parameters can differentiate these age groups were the COP: the range of motion and velocity in the mediolateral direction seems to be the one that contributes the most to this difference.These findings corroborate that the decrease in balance control is age-related, as shown in systematic review: in a static posturographic test with open or closed eyes, for COP displacement the differences between data in two age groups can vary 20% to 30% in anteroposterior direction and 40 to 50% in mediolateral direction; for velocity, the age differences range from 30% to 50% for both directions [47].
Our algorithms identified the age groups through the static balance parameters.In contrast, while it was not successful in differentiating participants with and without a history of falls, which may indicate that we are still unable to control or know which parameters have the greatest contribution to predicting falls occurrence.
The etiology of falls is multifactorial [48], an isolated test has low specificity in predicting fall risk.It is highly likely that static tests are not challenging enough to generate a disturbance even in participants who have some kind of balance impairment.Furthermore, there is a clear lack of specificity, since almost all falls in daily living activities occur in dynamic situations [49,50].
A limitation of this study is that the retrospective fall incidence was based in a self-report data.The sample size analysis was not performed and the number of participants among age groups is unbalanced.The sample size is also a detrimental factor for machine learning-type analyses, which generally require a larger amount of data.
Corroborating to other studies, ours showed static posturography has no effectivity to fall risk prediction.These results suggest next studies should focus on dynamic posturography to assess the risk of falls through single tests.

Conclusion
This study applied six different machine learning algorithms in order to test the possibility of using these methods to classify adults into non-fallers and fallers, creating a tool capable of predicting adults at greater risk of falling.However, none of the models was able to classify the participants at greatest risk of falling in this sample.
Our conclusion corroborates other works in the biomechanics field, which argue that static posturography does not have the desired effectiveness in predicting the risk of falling.This suggests that further studies should focus on dynamic posturography to assess the risk of falls through unique tests.
From the point of view of the ML tools, the computational methodology was correctly applied; the failure to obtain the expected results seems to derive only from the initial biomechanical assumptions and from the behavior of the data itself.

Table 2 . Sensitivity values for different machine learning algorithms applied to sort COP data according to age, falls and fall frequency.
graphic comparison of all algorithms' accuracy and sensitivity values can be seen in Fig 1.This data visualization shows the ML algorithms were only effective in distinguishing young from elderly. https://doi.org/10.1371/journal.pone.0296355.t002A